AITopics | directional pruning

Directional Pruning of Deep Neural Networks

Neural Information Processing SystemsDec-24-2025, 09:32:21 GMT

In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region. The proposed pruning method does not require retraining or the expert knowledge on the sparsity level. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned $\ell_1$ proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results demonstrate the promising results of our solution in highly sparse regime (92% sparsity) among many existing pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our solution reaches the same minima valley as the SGD, and the minima found by our solution and the SGD do not deviate in directions that impact the training loss.

deep neural network, directional pruning, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

a09e75c5c86a7bf6582d2b4d75aad615-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 12:23:54 GMT

directional pruning, grda, sgd, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Rhode Island > Providence County > Providence (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.05)
(9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Review for NeurIPS paper: Directional Pruning of Deep Neural Networks

Neural Information Processing SystemsJan-27-2025, 03:54:38 GMT

Additional Feedback: ### Comments My overall sense about this paper is that there is an interesting result here that would be significantly improved if the relationship to OBS were clarified, \mathcal{P}_0 were clarified, and the empirical results held up stronger. Particularly, on the last point, given that the method seems to not work particularly well with standard hyperparameters, I am less enthusiastic about directional pruning as a valuable pruning definition even though it seems natural. The results presented in the main body of the paper with non-standard hyperparameters and reduced accuracy for the initial network give me pause as well and, so perhaps the methodology of these experiments could be improved as well. Also, an alternative narrative that would make for a stronger result -- if true -- would be to map the OBS objective to solutions of this algorithm. In which case a reader needs not be concerned about if directional pruning itself is a valuable concept as OBS is already well established.

deep neural network, directional pruning, neural network, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Review for NeurIPS paper: Directional Pruning of Deep Neural Networks

Neural Information Processing SystemsJan-27-2025, 03:54:31 GMT

Thank you for your submission. There were many internal discussion about the paper. R3 championed the paper and appreciated the fact the method has theoretical footing. R1 & R2 raise critical issues with the empirical evaluation. R1 correctly highlighted that experiments do not include important baselines. Additionally, the evaluation was done on a nonstandard learning rate schedules, and results on standard learning rate schedule are not fully convincing (feedback didn't resolve this issue).

deep neural network, directional pruning, neurips paper, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Directional Pruning of Deep Neural Networks

Neural Information Processing SystemsOct-10-2024, 23:15:20 GMT

In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region. The proposed pruning method does not require retraining or the expert knowledge on the sparsity level. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned \ell_1 proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results demonstrate the promising results of our solution in highly sparse regime (92% sparsity) among many existing pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our solution reaches the same minima valley as the SGD, and the minima found by our solution and the SGD do not deviate in directions that impact the training loss.

deep neural network, directional pruning, pruning method, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Structured Directional Pruning via Perturbation Orthogonal Projection

YinchuanLi, null, XiaofengLiu, null, YunfengShao, null, QingWang, null, YanhuiGeng, null

arXiv.org Machine LearningJul-12-2021

Structured pruning is an effective compression technique to reduce the computation of neural networks, which is usually achieved by adding perturbations to reduce network parameters at the cost of slightly increasing training loss. A more reasonable approach is to find a sparse minimizer along the flat minimum valley found by optimizers, i.e. stochastic gradient descent, which keeps the training loss constant. To achieve this goal, we propose the structured directional pruning based on orthogonal projecting the perturbations onto the flat minimum valley. We also propose a fast solver sDprun and further prove that it achieves directional pruning asymptotically after sufficient training. Experiments using VGG-Net and ResNet on CIFAR-10 and CIFAR-100 datasets show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining. Experiments using DNN, VGG-Net and WRN28X10 on MNIST, CIFAR-10 and CIFAR-100 datasets demonstrate our method performs structured directional pruning, reaching the same minimum valley as the optimizer.

directional pruning, pruning, sdp, (15 more...)

arXiv.org Machine Learning

2107.05328

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Directional Pruning of Deep Neural Networks

Chao, Shih-Kang, Wang, Zhanyu, Xing, Yue, Cheng, Guang

arXiv.org Machine LearningOct-13-2020

In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region. The proposed pruning method does not require retraining or the expert knowledge on the sparsity level. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned $\ell_1$ proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results demonstrate the promising results of our solution in highly sparse regime (92% sparsity) among many existing pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our solution reaches the same minima valley as the SGD, and the minima found by our solution and the SGD do not deviate in directions that impact the training loss. The code that reproduces the results of this paper is available at https://github.com/donlan2710/gRDA-Optimizer/tree/master/directional_pruning.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2006.09358

Country: